Your request cart is empty!
Dataset Description
64:44:02 Hours | 7.1 GB | 233 Speakers| 26,223 Audio Segments | 16 kHz | 16 bit wav.
Gujarati is one of the major literary
languages of India and it is the official language of Gujarat state and union
territories of Daman and Diu and Dadra and Nagar Haveli. For the convenience
LDC-IL considered Gujarati with four dialects namely South Gujarat, Central
Gujarat, North Gujarat and Saurashtra.
LDC-IL has 64:44:02 hours Gujarati raw speech data as Mono recording. The LDC-IL Gujarati
Raw Speech data set consists of different types of datasets that are made up of
word lists, sentences, texts and date formats. Approximately 15 minutes of
speech (per speaker) has taken from 124 female and 109 male from Guajarati
mother tongue speakers of different age groups. Each speaker recorded these
datasets which are randomly selected from a master dataset.
The available Speech Corpus details:
Total Speakers 233 (124 Female and 109 Male)
Domains |
Audio
Segments |
Each
Domain Duration |
Contemporary Text (News) |
233 |
12:52:46 |
Creative Text |
232 |
13:30:15 |
Sentence |
5824 |
7:12:17 |
Date Format |
466 |
0:59:31 |
Command and Control Words |
6985 |
9:43:07 |
Person Name |
4644 |
8:34:44 |
Place Name |
2322 |
3:17:06 |
Phonetically Balanced |
4131 |
6:28:15 |
Form and Function - Word |
1386 |
2:06:01 |
A detailed explanation of the Gujarati Raw Speech Corpus (Mono Recordings) will be available in the Gujarati Raw Speech (Mono Recordings) Documentation.
For any research-based citations,
please use the following citations:
- Ramamoorthy L., Narayan Kumar Choudhary, Mona Parakh, Rejitha KS, Rajesha N., Manasa, G.2021. Gujarati Raw Speech Corpus(Mono Recordings). Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Kumar Choudhary, Mona Parakh, Rejitha K.S., Rajesha N., Manasa G.
- Corpus Type Raw Corpus
- Catalogue Number 1277
- ISBN 978-81-948885-8-1
- Data Source On Field
- Duration 64:44:02
- # of Audio Segments 26223
- Release Date 15-Jun-2021
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.